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Abstract 

In pay-per click sponsored search auctions which are currently extensively used by search engines, the 
auction for a keyword involves a certain number of advertisers (say k) competing for available slots (say 
m) to display their ads. This auction is typically conducted for a number of rounds (say T). There are 
click probabilities Hij associated with each agent-slot pairs. The goal of the search engine is to maximize 
social welfare of the advertisers, that is, the sum of values of the advertisers. The search engine does not 
know the true values advertisers have for a click to their respective ads and also does not know the click 
probabilities /XijS. A key problem for the search engine therefore is to learn these click probabilities during 
the T rounds of the auction and also to ensure that the auction mechanism is truthful. Mechanisms for 
addressing such learning and incentives issues have recently been introduced and are aptly referred to as 
multi-armed-bandit (MAB) mechanisms. When m — 1, characterizations for truthful MAB mechanisms are 
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available in the literature and it has been shown that the regret for such mechanisms will be 0(r3). In 
this paper, we seek to derive a characterization in the realistic but non-trivial general case when m > 1 and 
obtain several interesting results. Our contributions include: (1) When HijS are unconstrained, we prove that 
any truthful mechanism must satisfy strong pomtwise monotonicity and show that the regret will be 0{T) 
for such mechanisms. (2) When the clicks on the ads follow a certain click precedence property, we show that 
weak pointmse monotonicity is necessary for MAB mechanisms to be truthful. (3) If the search engine has 
a certain coarse pre-estimate of Hij values and wishes to update them during the course of the T rounds, we 
show that weak pointwise monotonicity and weakly separatedness are necessary and sufficient conditions for 
the MAB mechanisms to be truthful. (4) If the click probabilities are separable into agent specific and slot 
specific terms, we provide a characterization of MAB mechanisms that are truthful m expectation. 



1 Introduction 

Whenever a user searches any set of keywords on a search engine, along with the search results, called organic 
results, the search engine displays advertisements related to those keywords on the right side of the organic 
results. In pay-per-click sponsored search auctions, the search engine charges an advertiser for displaying her 
ad only if a user clicks on her ad. The decision regarding which ads are be displayed and their respective 
order is based on the bids submitted by the advertisers indicating the maximum amount they are willing to 
pay per click. To perform any optimizations, such as maximizing social welfare or maximizing revenue to the 
search engine, the true valuations of the advertisers are needed. Being rational, the advertisers may actually 
manipulate their bids and therefore a primary goal of the search engine is to design an auction for which it is 
in the best interest of each advertiser to bid truthfully irrespective of the bids of the other advertisers. Such an 
auction is said to be Dominant Strategy Incentive Compatible (DSIC), or truthful. 



1 



These auctions also take into account crucially the click probabilities or clickthrough rates (CTRs). Given 
an agent i and a slot j, the click probability /i^ is the probability with which the ad of agent i will be clicked 
if the ad appears in slot j. If the search engine knows the CTRs, then its problem is only to design a truthful 
auction. However, the search engine may not know the CTRs beforehand. Thus the problem of the search 
engine is two fold: (1) learn the CTR values (2) design a truthful auction. Typically, the same set of agents 
compete for the given set of keywords. The search engine can exploit this fact to learn the CTRs by initially 
displaying ads by various advertisers. Also note, it is reasonable to assume that they may not revise their bids 
frequently.If the advertisers were bidding true values, the search engine's problem would have been the same 
as that of a multi-armed bandit (MAB) problem [7] for learning the CTRs. Since the agents may not report 
their true values, the problem of the search engine can be described as one of designing an incentive compatible 
MAB mechanism. In the initial rounds, the search engine displays advertisements from all the agents to learn 
the CTRs. This phase is referred to as exploration phase. Then it uses the information gained in these rounds 
to maximize the social welfare. The latter phase is referred to as exploitation. The search engine will invariably 
lose a part of social welfare for the exploration phase. The difference between the social welfare the search 
engine would have achieved with the knowledge of CTRs and the actual social welfare achieved by a MAB 
mechanism is referred to as regret. Thus, regret analysis is also important while designing a MAB mechanism. 

1.1 Related Work 

The problems where the decision maker has to optimize his total reward based on gained information as well as 
gain knowledge about the available rewards are referred to as Multi- Armed Bandit (MAB) problem. The MAB 
problem was first studied by Robbins T: in 1952. After his seminal work, MAB problems have been extensively 
studied for regret analysis and convergence rates. Readers are referred to |2j for regret analysis in finite time 
MAB problems. However, when a mechanism designer has to consider strategic behavior of the agents, these 
bounds on regret would not work. Recently, Babaioff, Sharma, and Slivkins [3] have derived a characterization 
for truthful MAB mechanisms in the context of pay-per-click sponsored search auctions if there is only a single 
slot for each keyword. They have shown that any truthful MAB mechanism must have at least n{T^) worst 
case regret and also proposed a mechanism that achieves this regret. Here T indicates the number of rounds 
for which the auction is conducted for a given keyword, with the same set of agents involved. 

Devanur and Kakade have also addressed the problem of designing truthful MAB mechanisms for pay- 
per-click auctions with a single sponsored slot. Though they have not explicitly attempted a characterization 
of truthful MAB mechanisms, they have derived similar results on payments as in [3]. They have also obtained 
a bound on regret of a MAB mechanism to be 0(T3). Note that the regret in [3] is regret in the revenue to 
the search engine, as against regret analysis in [3 is for social welfare of the advertisers. In this paper, unless 
explicitly stated, when we refer to regret, we mean loss in social welfare as compared to social welfare that could 
have been obtained with known CTRs. 

In both of the above papers, only a single slot for advertisements is considered and therefore the practical 
appeal is limited. Generalization of their work to the more realistic case of multiple sponsored slots is non-trivial 
and our paper seeks to fill this research gap. 

Prior to the above two papers, Gonen and Pavlov [5] had addressed the issue of unknown CTRs in multiple 
slot sponsored search auctions and proposed a specific mechanism. Their claim that their mechanism is truthful 
in expectation has been contested by [21 S] • Also Gonen and Pavlov do not provide any characterization for 
truthful multi-slot MAB mechanisms. 

1.2 Our Contributions 

In this paper, we extend the results of Babaioff, Sharma, and Slivkins ^^.d Devanur and Kakade [1] to 
the non-trivial general case of two or more sponsored slots. The precise question we address is: which MAB 
mechanisms for multi-slot pay-per-click sponsored search auctions are dominant strategy incentive compatible 
(or truthful)? We describe our specific contributions below. 

In the first and most general setting (Section l3.1l) . we assume no knowledge of click through rate {^ij) values 
or any relationships among /i^ values. We refer to this setting as the "unknown and unconstrained CTR" 
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setting. Here we show that any truthful mechanism must satisfy a highly restrictive property which we refer to 
as strong pointwise monotonicity property. We show that all mechanisms satisfying this property will however 
exhibit a high regret, which is 0{T). This immediately motivates our remaining Sections 13.21 13.31 and 13.41 
where we explore the following variants of the general setting which yield more reasonable characterizations. 

First, in Section [3. 2[ we consider a setting where the realization is restricted according to a property which 
we call the Higher Slot Click Precedence property (a click in a lower slot will automatically imply that a click is 
received if the same ad is shown in any higher slot). For this setting, we provide a weaker necessary condition 
than strong pointwise monotonicity. Finding a necessary and sufficient condition however remains open. 

In Section [231 we provide a complete characterization of MAB mechanisms which are truthful in expectation 
under a stochastic setting where a coarse estimate of /Xy is known to the auctioneer and to the agent i, perhaps 
from some database of past auctions. Under this setting, the auctioneer updates his database of values 
based on the observed clicks, thereby improving his estimate and maximizing revenue. 

Finally, in Section 13.41 we derive a complete characterization of truthful multi-slot MAB mechanisms for a 
stochastic setting where we assume that the /i^jS are separable into agent-dependent and slot-dependent parts. 
Here, unlike the previous setting, we do not assume existence of any information on agent-dependent click 
probabilities. 

For all the above multi-slot sponsored search auction settings, we show that the slot allocation in truthful 
mechanisms must satisfy some notion of monotonicity with respect to the agents' bids and a certain weak 
separation between exploration and exploitation. 

Our results are summarized in Table [T] 



Number of 
Slots (m) 


Learning Parameter 
(CTR) 


Solution Concept 


Allocation rule 


Regret 


m = 1 3 


Unrestricted 


DSIC 


Pointwise monotone and 
Exploration separated 


0(T2/3) 


m > 1 


Unrestricted 


DSIC 


Strongly pointwise monotone 
and weakly separated 


0(T) 




Higher Slot Click Precedence 


DSIC 


Weakly pointwise monotone 
and weakly separated 
(Necessary Condition) 


regret analysis 
not carried out 




CTR Pre-estimates available 


Truthful in 

expectation 


Weakly Pointwise monotone 
and weakly separated 


regret analysis 
not carried out 




Separable CTR 


Truthful in 
expectation 


Weakly Pointwise monotone 
and weakly separated 


0(T2/3) 

(Experimental Evidenc 



Table 1: Results 



Our approach and line of attack in this paper follow that of [3 where the authors use the notions of pointwise 
monotonicity, weakly separatedness and exploration separatedness quite critically in characterizing truthfulness. 
Since our paper deals with the general problem of which theirs is a special case, these notions continue to 
play an important role in our paper. However, there are some notable differences as explained below. We 
generalize their notion of pointwise monotonicity in two ways. The first notion we refer to as strong pointwise 
monotonicity and the second one as weak pointwise monotonicity. In addition to this, we introduce the key 
notions of Influential Set, i-influentiality and Strongly influential. We use these new notions to define a non- 
trivial generalization of their notion of weakly separatedness, to which we, however, continue to associate the 
same name. The characterization of truthful mechanisms for a single parameter was provided by [1] |^. For 
deriving payments to be assigned to the agents for truthful implementation, we use the approach in [J [B] . 

In Section 21 we provide some simple experimental results on regret analysis. We conclude the paper in 
Section [SI 
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2 System Set up and Notation 



In the auction considered, there are k agents and m ad slots (k > m). Each agent has a single advertisement 
that she wants to display and a private value Vi which is her value per click on the ad. The auctioneer, that 
is the search engine wishes to distribute the ads among these slots. These advertisements have certain click 
probabilities which depend upon the agent as well as the slot with which the agent is associated. Let be 
the probability of an ad of an agent i receiving click in slot j. Now, the goal of the search engine is to assign 
these agents to the slots in such a way that the social welfare, which is the total value received by the bidders, 
is maximized. However, there are two problems, (i) the search engine does not know u^, the valuations of the 
agents and (ii) the search engine may not know the click probabilities /i^ . 

So, the goal of the search engine is: (i) to design a DSIC auction in which it is in the agents' interest to bid 
their true values, ViS (ii) to estimate /i^j. We consider multi-round auctions, where the search engine displays 
the various advertisements repeatedly over a large number of rounds. The mechanism uses the initial rounds in 
an explorative fashion to learn /i.y and then uses the other rounds exploitatively to gain value. 

The system works as follows. At the start of the auction, each agent submits a sealed bid bi. Based on 
this bid and the click information from previous rounds, the mechanism decides to allocate each ad slot to a 
particular agent and then displays the m chosen ads. The user can now click on any number of these ads and 
this information gets registered by the mechanism for future rounds. At the end of T rounds, depending on 
the bids submitted by the agents and the number of clicks received by each agent, the agents have to make a 
certain payment Pi to the mechanism. 

Note: Pi and C'i are functions of b and p. Whenever the arguments are clear from the context, we just refer to 
them as Pi and Ci . 

A mechanism can be formally defined as the tuple {A, P) where A is the allocation rule specifying the slot 
allocation and P is the payment rule. 

The important notation used in the paper is summarized in Table [2] Following this, we define the terms 
used in this paper. 

2.1 Important Notions and Definitions 

Definition 2.1 (Realization p) We define a realization p as a vector {p{l) , p{2) , . . . , p{T)) where p{t) = 
[pij{t)\KKM is click information in round t. Pij{t) — 1, if an agent i's ad receives a click in slot j in round t, 
else 0. 

It is to be noted that the mechanism observes only those Pij{t) where Aij(b,p,t) = 1. 
Definition 2.2 (Clickwise Monotonicity) We 

call an allocation rule A clickwise monotone if for a fixed [b-i,p), the number of clicks, Ci{bi,b-i, p) is a 
non- decreasing function of bi. That is, '^'^l^.'^ > V (&-i, p). 

Definition 2.3 (Weak Pointwise Monotonicity) We call an allocation rule weak pointwise monotone if, 
for any given {b^i,p), and bid b^ > bi, Aij{{bi,b-i), p,t) = 1 
Aij/ {{bf ,b^i), p,t) — 1 for some slot j' < j, \/t. 

Definition 2.4 (Influential Set) Given a bid vector, b, a realization p and round t, an influential set I{b, p,t) 
is the set of all agent-slot allocation pairs (i,j), such that (i) Aij{b,p,t) — 1 and (ii) a change in Pij{t) will 
result in a change in the allocation in a future round, t is referred to as an influential round. Agent i is referred 
to as an influential agent and j as influential slot w.r.t round t. 

Definition 2.5 (i-Influential Set) We define the i-influential set N{b,p,i,t) C I{b,p,t) as the set of all 

influential agent-slot pairs {i' such that change in Pi'j'it) will change the allocation of agent i in some future 
round. 

Definition 2.6 (Strongly Influential) We call a slot-agent pair {i* strongly influential in round t w.r.t. 
the realization p{t), if changing the realization (toggling) in the bit pi-^j^lt) changes the allocation in a future 
round. We call such a set (i* ,j* ,t) strongly i-influential if one of its influenced agents is i. 



4 



K 


= {1, 2, . . . , k}, Set of agents 


M 


= {1, 2, . . . , m} Set of slots 


i 


Index of an agent, i = 1, 2, . . . , fc 


3 


Index of a slot, j — 1,2, . . . ,m 


T 


Total number of rounds 


t 


A particular round, t £ {1,2,..., T} 


Aij{t) 
Ait) 


= 1 If an agent i is allocated slot j in round t 
— otherwise 

{Aij{t))i^K,jeM 


A 


= {A{l),A{2), A{T)), Allocation rule 


Pij (t) 
Pit) 


= 1 if agent i gets a click in slot j in round t 
= otherwise 

(pij(t))i6if,j6M 


P 


= (p(l),p(2),...,p(r)) 


Vi 


Agent i's valuation of a click to her ad 


h 


Bid by agent i 


h 


Bid vector, indicating bids of all the agents 
= {bi,b-i) = {bi,b2,. . . ,bk) 


Ci[h,p) 


Total number of clicks obtained by an agent i 
in T rounds 


nib,p) 


Payment made by agent i 


Pib,p) 


= (Pi(.),P2(. ),..., Pfc(.)), Payment rule 


Ui(vi,b, p) 


Utility of an agent i in T rounds 
= v^a{b,p)-Pi{b,p) 


bt 


A real number i bi 


ai 


Click probability associated with agent i 


ft 


Click probability associated with slot j 


pij 


The probability that an ad of an agent i receives 
click when the agent is allotted slot j. 


Nib,p,i,t) 


Set of slot agent pairs in round t 

that influence agent i in some future rounds 


CTR 


CUck Through Rate (Click Probability) 


DSIC 


Dominant Strategy Incentive Compatible 



Table 2: Notation 



Definition 2.7 (Weakly Separated) We call an allocation rule weakly separated if for a given ib^i,p) and 
two bids of agent i, hi and bf where bi < bf , N{{bi,b-i), p,i,t) C N{{b'l ,b^i), p,i,t). 

This means that when an agent i increases her bid, while the other parameters are kept fixed, the allocation in 
the originally influential slots does not change, only new influential agent-slot pairs can get added. We continue 
to use definitions of Normalized Mechanism and N on- degeneracy from [3]. With these preliminaries, we are now 
ready to characterize truthful MAB mechanisms for various settings in the next section. 

3 Characterization of Truthful MAB Mechanisms 

Before stating our results, we prove a minor claim that we will use to develop our characterizations. We will 
use this claim implicitly in our proofs. 

Claim 3.1 Given [b, (p(l),p(2), . . . ,p{t — l))), «/(«*, j*) is i-influential in roundt, then 3p*{t) such that {i*,j*) 
is also strongly i-influential w.r.t. p*{t) in round t. 

Proof: 

Suppose the claim is false. Let the z-influential set of slots in round t be N{b, p, i, t) — {{i^ (i^, j^), . . . , 



5 



N{b,p,i,t) ^ (f) since it has at least one element Since we have assumed our claim to 

be false, is not strongly i-influential for any realization (p^iji (/:), ^^2^2 (i), . . . ,piiji{t)) or the allocation 

of agent i in future rounds is the same whether pi*j* is or 1 for every given (pjiji {t),pi2j2 {t), . . . , p^iji (t)). This 
means that the allocation of agent i is the same in future rounds for all realizations (piiji (i), ^^2^-2 (t), . . . , p^iji (t), pi<-j* {t)). 
But this contradicts the fact that ■ ■ ■ , {i^f), {i* is the set of i-influential slot-agent pairs 

in round t. This proves our claim. 

□ 

In our characterization of truthfulness under various settings, we show that a truthful allocation rule A must 
be weakly separated. Though the proofs look similar, there are subtle differences in each of the following sub- 
sections. In our proofs, we start with the assumption that a truthful allocation rule A is not weakly separated. 
That is, 

36, < b+,h-^,p,t 3 N{b„b^^;p,t,i) % N{b+ ,b-^; p,t,i) I , , 

Subsequently, we show that this leads to a contradiction in each of the subsections, implying the necessity of 
weakly separatedness. 



3.1 Unknown and Unconstrained CTRs 

In this setting, we do not assume any previous knowledge of the CTRs although we do assiime that such CTRs 
exist. Here, we show that any mechanism that is truthful under such a setting must follow some very rigid 
restrictions on its allocation rule. 

Definition 3.1 (Strong Pointwise Monotonicity) An allocation rule is said to be strongly pointwise m,ono- 
tone if it satisfies: For any fixed (b^i, p), if a,n agent i with bid bi is allocated a slot j in round t, then \/ b^ > bi, 
she is allocated the same slot j in round t. That is if the agent i receives a slot in round t, then she receives 
the same slot for any higher bid. For any lower bid, either she may receive the same slot or may loose the 
impression. 

Theorem 3.1 Let (A, P) be a deterministic, non-degenerate mechanism for the MAB, multi-slot sponsored 
search auction, with unconstrained and unknown pij. Then, mechanism {A,P) is DSIC iff A is strongly 
pointwise monotone and weakly separated. Further, the payment scheme is given by, 

Pi{bi,b-i;p) = biCi{bi,b-i;p) - / Ci{x,b-i; p)dx. 

Jo 



Proof: 

The proof is organized as follows. In step 1, we show the necessity of the payment structure. In step 2, we show 
the necessity of strong pointwise monotonicity. Step 3 proves the necessity of weakly separatedness. Finally in 
step 4, we prove that the above payment scheme in conjunction with strong pointwise monotonicity and weakly 
separatedness imply that the mechanism is DSIC. 
Step 1: The utility structure for each agent i e is 

Ui{vi, {bi,b-i),p) = ViCi{{bi,b-i),p) - Pi{{bi,b-i), p) 

The mechanism is DSIC iff it is the best response for each agent to bid truthfully. That is, by bidding 

truthfully, each agent's utility is maximized. Thus 
{A,P) is DSIC ^ ^|6,=,;, = and |^|5,;=„. < OVt;^. 
From the first order equation, we obtain, 

,dCi dPi 
doi dbi 

We need Pi(0) = for normalization. Integrating the above and by second order conditions, we need > 0, 
which is the clickwise monotonicity condition. 
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Thus, for {A, P) to be DSIC, we need 



Pi{bi,b-^;p) ^ biCi{bi,b-i;p) ~ / Ci{x,b-i; p)dx 

Jo 

dC 

and ^ >0V((6„6_,),p) (2) 

Step 2: We first prove the necessity of strong pointwise monotonicity by contradiction. We have seen from ^ 
that > y{{bi, b^i), p) is necessary for DSIC of A. We show that if A is not strongly pointwise monotone, 
then there exists some allocation and realization p for which < 0. If yl is not strongly pointwise monotone, 
there exists {bi,bf ,b-i, p,t) 3 

Aij^{{bi,b_i),p,t) = 1 and Aij^{{b+ ,b^i), p,t) = 1, 

where ji ^ j2 (3) 

Over all such counter-examples, choose the one with the minimum t. By this choice, we ensure that in this 
example Vi' < t, we have Aij{bi,t') — Aij{bf,t'). The only difference occurs in round t. Now, consider the 
game instance where Pij^{t) = 1, Pij2{t) — 0, Pijir) = OV t > t. The occurrence of such p has non-zero 
probability. Now, under {b-i,p), agent i has the same allocation and the same number of clicks until round 
{t — 1) independent of whether she bids bi or 6^. However, in round t with bid &i, she receives a click and with 
bid b^ she does not, implying for this case that < 0. This violates the click monotonicity requirement. So, 
strong pointwise monotonicity is indeed a necessary condition for truthful implementation of MAB mechanisms 
under this setting. 

Step 3: We prove the necessity of the weakly separatedness condition by contradiction. That is we assume ([1]). 

Over all such possible counter-examples of 6^, fe^, 6_i, p, choose the one with the least t. Now, either z* = z or 
i* ^ i. 

Case 1: {i* =i). Consider the realization p' differing from p only in round t in the entry pi-j-. That is, 
Pi.j.{t) = 1 - Pi'j'{t) and p'i,j{t") = Pi'j{t") y{i',j,t") ^ {i*,j*,t). We can assume, Pijir) = Vr > t as the 
clicks in future rounds do not affect decisions in the current round. Since (i* = i,j*) is not part of the allocation 
in round t under the original bid bi, the difference between p and p' is not observed by the mechanism. However, 
the prices computed by the payment scheme ^ to agent i differ under these two realizations. (See [21 for details 
on why the payments are different). 

Case 2: i* ^i. Now, choose p(t) to be that realization for which is strongly i-influential. Now, 

let t' be the first round i-influenced by {i*,j*,t). Consider the realization p' which differs from p in that 
p'^tj, (t) — 1 — pi-jt (t). Agent i's allocation and click information differs only in round t' under the two different 
realizations p and p'. Now, let Aij-^((bi,b^i), p,t') — 1 and Aij2{{bi,b^i), p' ,t') = 1 or agent i gets slot ji in 
round t' with bid bi under realization p and slot j2 under realization p' . Here since the two differ only in 
pi*j*{t), by the strongly i-influential nature of {i*,j*,t) under this realization we have ji ^ j2. Without loss of 
generality, let ji < j2 (or ji be the better slot, since it is possible that one of the realizations leads to no slot 
allocation). Now, we choose p{t') = p'{t') in the following manner: Piji{t') = Pij^{t') — 1, Pijal^') — Pija^^') ~ 
and Pijir) — Pijir) — 0, V t > t' . We can make such an arbitrary choice since the realization from the round t' 
onwards does not affect the allocation in round t. 

Under this choice of p and p', agent i clearly gets more clicks under realization p than p' with bid bi. 
Now, agent i's number of clicks varies with her bid based on only her allocation in round t' which changes 
only if with bid x the pair is i-influential in round t with t' earliest influenced round. With any such 

bid X, under realization p' , agent i will either get slot j2 in round t' or no slot at all (by strong pointwise 
monotonicity), which in turn means that she will never get a click under realization p' in round t'. Hence, 
Ci{{x,b-i), p) > Ci{{x,b-i), p'Jix < bf. Additionally, we have Ci((xo, p) > Ci{{xo,b^i), p'). Using these 
relations and the non-degeneracy condition (see [3] for details), we have Pi{{bf ,b-i), p) < Pi{{bf ,b-i), p'). p' 
only differs from p in the unobserved bit {i*,j*,t). Hence, the mechanism fails to assign unique payment to 
agent i leading to a contradiction. This shows the necessity of weakly separatedness. 
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Step 4: Finally, we show that strong pointwise monotonicity and weakly separatedness are sufficient conditions 

for clickwise monotonicity and computability of the payments and hence for truthfulness. Suppose ^ is a strongly 
pointwise monotone and weakly separated allocation rule. So, it clearly satisfies the clickwise monotonicity. Now, 
by the weakly separatedness condition, we already have all the information required to calculate the allocation 
of agent i in every round for every bid x < bi. This is because the z-influential set for bids a; < 6^ is a subset 
of the known infiuential set, we already have all the possible click information required for the i-influential 
sets. Additionally, by the strong pointwise monotonicity condition, we know that for each bid x < bi and 
each round either agent i keeps the same slot she had in the observed game instance (6^, b-i] p) or loses the 
impression altogether, that is, does not get a click. Hence, we have all the information required to compute 
Pi{bi, b^i] p) — biCi{bi, b^i] p) — Jq' Ci{x, b^f, p)dx. This completes the sufficiency part of the theorem. 

□ 

Implications of Strong Pointwise Monotonicity 

For a given round i, if an agent i is allocated a slot j, then by the definition of strong pointwise monotonicity 
she receives the same slot for any higher bid that she places. If she lowers her bid, she may either retain the 
slot j, or lose the impression entirely. This leads to the strong restriction that an agent's bid can only decide 
whether or not she obtains an impression, and not which slot she actually gets. As we shall show below, this 
restriction has serious implications on the regret incurred by any truthful mechanism. 

Regret Estimate 

In the single slot case it is a known result that the worst case regret is 0{T^^^) [3]. So, for the multi-slot case, 
the regret is n(T^/'^). We show here that the worst case regret generated in the multi-slot general setting by a 
truthful mechanism is in fact 0(T). We show this for the 2 slot, 3 agent case with an intuitive argument, which 
can be generalized. 

Consider a setting with two slots and three competing agents, that is m = 2, fc = 3. Let the agents be Ai, 
A2 and A3. By Theorem 13.11 any truthful mechanism has to be strongly pointwise monotone. That is, in any 
round, the bids of the agents only determine which agents will be displayed and not the slots they obtain. 

Suppose, Aa's bid 63 < min(&i, 62) in addition to having low CTRs. In this case, any mechanism that grants 
A3 an impression 0(T) times, will have regret 0{T). 

So, we can assume that A'^s ad gets an impression for a very small number of times when compared with T. 
Thus, ads by Ai and A2 will appear 0{T) times. In each round, Ai will get either slot 1 or slot 2 independent 
of her bid, while the other slot is assigned to A2. 

In any strongly pointwise monotone mechanism, either Ai is assigned a slot 1 0{T) times or slot 2. 

Without loss of generality, we assume that Ai is assigned slot 1 0{T) times. So, the allocation (slot 1, 
slot 2)o(Ai,A2) is made 0{T) times. Consider a game instance where this is not the welfare maximizing 
assignment, that is, the relation (/xn&i -I- /i22&2) < (/^i2&i + P2ib2) holds true. Since the slot allocation does 
not depend on the individual bids, such an instance can occur. In such a setting (^2,^1) would have been 
optimal assignment. As a result, each round having the allocation (^41,^2) incurs constant non-zero regret. 
Since such an allocation occurs 0{T) times, the mechanism has a worst case regret of 0(T). Hence any truthful 
mechanisms under the unrestricted CTR setting exhibit a high 0(T) regret. 

□ 

Since the strong monotonicity condition places such a severe restriction on A and also leads to a very high 
regret, in the following sections we explore some relaxations on the assumption that /i^ 's are unrelated. With 
such settings which are in fact practically quite meaningful, we are able to prove more encouraging results. 

3.2 Higher Slot Click Precedence 

This setting is similar to the general one discussed above in that we do not assume any knowledge about the 
CTRs. However, we impose a restriction on the realization p that it follows higher slot click precedence defined 
below. 
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Definition 3.2 A realization p is said to follow Higher Slot Click Precedence ifVi G K, Vt = 1,2, . . . ,T , 



Pin (0 = 1^ Pin (*) = 1 Vj2 < ii 



Higher slot click precedence implies that if an agent i obtains a click in slot ji in round t, then in that round, 
she receives a click in any higher slot j2 . This assumption is in general valid in the real world since any given 
user (fixed by round t) who clicks on a particular ad when it is displayed in a lower slot would definitely click 
on the same ad if it was shown in a higher slot. 

We show, under this setting, that weak pointwise monotonicity and weakly separatedness are necessary 
conditions for truthfulness. They are, however, not sufficient conditions. Clearly, strong pointwise monotonicity 
and weakly separatedness will still be sufficient conditions. A weaker sufficient condition for truthfulness under 
this setting is still elusive. 

Implications of tiie Assumption 

Observe that a slot-agent pair {i,j) is influential in some round t only if changing the realization in the entry 
Pi.j{t) for some realization p results in a change in allocation in some future round. Crucial to the infiuentiality 
is the fact that Pij{t) can change. 

Now, consider the following situation; it has been observed that in the game instance {(hi,b^i), p), we 
have Pijiit) = where agent i obtains slot ji in round t. We are interested in the game instance {{x,b^i), p) 
where agent i gets slot j2 > ji where x < bi and in knowing whether {i,j2) is an influential pair in round t 
for some influenced agent. Now, since pij-^ = and ji < ji, by our deflning assumption, we conclude that 
Pij^ify — \/x < bi. Hence, our mechanism knows that in all the relevant cases, the realization in the given 
slot-agent pair never changes. Hence, («, J2) cannot be an influential pair for any j2 > ji in round t. We will 
use this observation in the proof of necessity characterization. 

Proposition 3.1 Consider the setting in which realization p follows Higher Slot Click Precedence. Let {A,P) 
be a deterministic non- degenerate DSIC mechanism for this setting. Then the allocation rule A must be weak 
pointwise monotone and weakly separated. Further, the payment scheme is given by, 



The proof for the payment scheme is identical to that in Theorem 13. II We prove the necessity of weak pointwise 
monotonicity and weakly separatedness. 

Step 1: We flrst prove the necessity of weak pointwise monotonicity, in a very similar fashion to that of the 
necessity of strong pointwise monotonicity in Theorem 13.11 The crucial difference is, while constructing p, we 
have to ensure that it satisfles the higher order click precedence. Suppose A is truthful but not weakly pointwise 
monotone, that is, 3 {bi, b^ , b^i, p, t) and 

Aij^ {bi, b-i),p, t = 1 and A^j^ {{bf ,b-i), p, t) — 1 for some ji < ji. Over all such examples, choose the one with 
the least t. By this choice, we ensure that in this example, Vt' < t, we have Aij{bi,t') = Aij{bf,t'). The only 
difference occurs in round t. Now, consider the game instance where pij-^ {t) — 1 and pij^ {t) = 0. Such realization 
has a non-zero probability of occurrence. Now, under {b-i,p), agent i gets the same allocation and the same 
number of clicks until round {t — 1) independent of whether she bids bi or bf . However, in round t with bid bi 
she gets a click and with bid bf she does not, implying for this case that < 0. This leads to a contradiction. 
So, weak pointwise monotonicity is a necessary condition. If A is not strongly pointwise monotone, does 
not violate clickwise monotonicity. That is, for truthful A, it may possible that, Aij^{{bi,b-i), p,t) = 1 and 
Aij^{{bf ,b-i), p,t) = 1 where ji < ji. Thus, for A to be truthful, strong pointwise monotonicity may not be 
necessary. 

Next, we prove the necessity of the weakly separatedness condition. Again, we prove this claim by contra- 
diction. We follow the same steps as in proof of Theorem 13. 1[ except we need to justify our choices of p, as it 
should satisfy higher order click precedence property. 



P^{b^,b 




Proof: 
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Case 1: {i* = i). Here, as in the previous proof we choose p' such that, p^*^. (i) = 1 — pi'j* (t) and V («', j, t") ^ 

{i*,j*,t) p'^ijit") — pi'j{t"). We need to show that this choice of p does not contradict the higher slot click 
precedence. Now, from our assumption, is an influential pair in round t. From our observation in 

Section 1321 it follows that Pi*j'{t) and p[,j,{t) must be able to take any value from {0, 1}. It also forces that 
Vj < J*, Pi*j{t) = 1. Since {i* = J*) is not part of the allocation in round t under the original bid 6^, the 
difference between p and p' is not observed by the mechanism. However, the payments by agent i differ under 
these two realizations (see [3] for details on why the payments are different). 
Case 2: i* ^ i. Here, over all examples with {i* influential pair in round t with influenced agent i in the 

earliest influenced round t', we choose the one with minimum xq. In this case, our choice of p{t) and p' {t) is the 
same as in Theorem 13.11 while our choice of p{t') = p'{t') differs. Again the choice of p{t) and p'{t) is a valid 
assumption by the influentiality of Without loss of generality, let ji < j2 where ji and j2 are defined 

as in Theorem 3.1. Now, we choose p{t') = p'{t') in the following manner: Pij{t') = p'ij{t') = 1 V j < ji and 
Pij'{t') = p'ij'{t') = Vj' > ji. We can make such an arbitrary choice since the realization from the round t 
onwards does not affect the allocation in round t. Now, the rest of the arguments from the proof of Theorem 
13.11 follow and lead to contradiction that a mechanism can not distinguish between p and p', however it needs 
to assign different payments under these realizations. This shows the necessity of weakly separatedness. 

□ 

3.3 When CTR Pre-estimates are Available 

In this setting, we assume the existence of some previous database or pre-estimate of CTR values but no 
restriction on p. That is, pij = where Xij is the number of clicks obtained by agent i in slot j out of 
the Yij times she obtained the slot j over all past auctions. Here, in general, pn > pi2 > . . . > Pim- For our 
characterization, we assume that each pi = (/ia,Mi2, • • • , Pim) is known to the agent i and the auctioneer. 

In this setting, the auctioneer uses explorative rounds to improve his estimate of the CTRs and updates 
the database. Then, he makes use of his new knowledge of the CTRs in the exploitative rounds. The payment 
scheme, however, only makes use of the old CTR matrix. Under this scheme, we derive the conditions required 
for a mechanism to be iruthful in expectation over p, defined as follows. 

Definition 3.3 (Trutiiful in Expectation) A mechanism is said to be truthful in expectation over p, the 
CTR pre-estimate, if each of the agents believes that the number of clicks she obtains is indeed '^jiPij^ij)^ 
which is the number of clicks she will obtain if the CTR pre-estimate is perfectly accurate. 

3.3.1 Fairness 

For this characterization, we need the notion of fair allocation rules, as defined below. 

Definition 3.4 (Fair Allocation) Consider two game instances 6_i), p) and ((5^, b^i), p) having the same 
slot-agent-round triplets, {i',j',t') as strongly i-influential. Let (i*,j*,t) be such triplet with the smallest t' in 
which i is influenced. Consider the realization p' differing from p only in this influential element pi*j*{t). Then, 
the allocation rule A is said to be fair if for every such pair of games it happens that 

Y.^p,,A,,{{b,,b^,),p,t') > j:^ p,,A,,iih„b^,),p',t') ^j:, i^vA^jm,b^,),p,t') > j:^p.,,A,,m,b^,),p',t') 

The intuition behind fair allocations is that changing the realization only in a fixed strongly i-influential 
slot generally changes agent «'s allocation in a predictable fashion independent of her own bid, either improving 
her slot or worsening it in the earliest influenced round, irrespective of the allocation or realization in the rest 
of the game. For example, if agent i's chief competitor agent, is strongly i-influential, then i' not getting a 
click in the influential round will generally mean that agent i will go on to get a better slot than if agent i' got 
a click, independent of bi. 
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3.3.2 Truthfulness Characterization 

El 

Here, the expected utiUty for the agent i, 

T m 

U,ivi,b,p):^ (4) 
t=l j=l 

Proposition 3.2 Let (A,P) be a normalized mechanism under this setting. Then, the mechanism is truthful 
in expectation over n iff A is weakly pointwise monotone and the payment rule is given by 



T m „bi 

Pi{b,fi)='^'^^iij{biAij{b,fi,t)- / Aij{x,b^i,fi,t)dx} 
t=i j=l -^0 



and payments are computable. 



Proof: 

In Step 1, we prove the necessity and sufficiency of the payment structure. For the mechanism to be implemented, 
we need to compute the payments of all the agents uniquely. That is, PiS need to be computable for all agents 
i. In Step 2, we show weak pointwise monotonicity is equivalent to the second order condition which is clickwise 
monotonicity in the context of this paper. 
Stepl: The expected utility of an agent i is given by 

The (A,P) is truthful «#^|fc.=.. = & < Wv,. 

From the first order condition we get, 

T m ^i,. 
Pi{b,fi)='^'^fj.ij{biAij{b,fi,t)~ / Aij{x,b^i,p,t)dx} 
t=l j = l -^0 

From the second order condition, we need, 

v..EEm..^>o (^) 

Step 2: We show, ([5]) <^ weak pointwise monotonicity. 

(i) It is obvious that weak pointwise monotonicity =^ 

St t^ij ~dir' — increase in bi under a weakly pointwise monotone A would result in a better slot 

allocation for agent i. This in turn, would result in an increase in J^j l^ij^ij in each round. 

(ii) Now we prove the converse. Suppose A is not weakly pointwise monotone. That is, 3 i, bi, b^ ,b^i, p,fi,t 3 
Aij{bi,b-i, p, ^,t) = 1 and Aiji{bf ,b-i,p,p,t) = 1 where j' > j. Consider the smallest such t. Allocation in 
this round does not depend upon the realization of this round or of future rounds. We consider the instance of 
the game where Pij(t) = 1 and Pij'{t) = and t is the last round. Such an instance has a non-zero probability 
and for this instance, ^ PijAij < 0. This proves the equivalence claim. 

□ 

Note, it is crucial that each pij is a previously known constant and cannot be defined as pij — Xij /Yij based 
on the clicks in the current T rounds post facto. If we do so, Xij /Yij can change with the allocation of agent 
i in a particular game and hence, pij would become a function of bi and the mechanism would be no longer 
truthful. 



^Note, the characterization in this section would hold even if /lij are arbitrary weights. However, while using arbitrary weights, 
mechanism may charge some agents more than their actual willingness to pay. Also regret in the revenue, that is loss in the revenue 
to the search engine will be trivially 0(T). 
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For truthful implementation, the payments need to be computable and computing the payments may involve 
the unobserved part of p. In the next theorem, we show that weakly separatedness is necessary and sufficient 
for computation of these payments. So, along with the computation of payments and the above proposition, we 
get. 

Theorem 3.2 Let (A, P) be a mechanism for this stochastic multi-round auction setting where A is a non- 
degenerate, deterministic and fair allocation rule. Then, {A, P) is truthful in expectation over fj, iS A is weakly 
pointwise monotone and weakly separated and the payment scheme is given by 

T m „bi 

Pi{b,p)^'^'^Pij{biAij{b,p,t)- / Aij{x,b^i,p,t)dx} 



Proof: 

This setting/characterization works best with old advertisers who have already taken part in a large number 
of auctions. As we already have proved Proposition 13. 2[ we just need to show that weakly separatedness is in 
fact a necessary and sufficient condition for the computability of payments, that is, computability of 
X^jLi l^ij lo' ^iji^y ^-ii Pi t)dx for cach agent i. 

Step 1: We first provide the proof for the sufficiency of weakly separatedness. Suppose A is weakly separated. 
The mechanism observes and knows all allocations and the observed realization for the game instance carried out 
with the original bid vector Specifically, it knows N{{bi, b^i), i, p, t) for all rounds t and the respective 

realizations in these slots. Now, in the game instance {x,b^i) where x < bi, by weakly separatedness, we have 
N{{x,b-i),i, p,t) C N{{bi,b-i),i, p,t). This means that the allocation in i-influential slots for game instance 
{{x,b-i)t) is a subset of that in observed game instance {{bi,b-i), p). So, the mechanism already knows all 
the click information in the i-influential slots for the game instance {{x,b-i), p). Since the payment scheme is 
only interested in the allocation of agent i, the realization in the unobserved slots is unimportant and can be 
assumed arbitrarily. Thus, the mechanism has complete information to compute Pi((bi,b-i), p,t). 
Step 2: Next, we prove the necessity of weakly separatedness by contradiction. That is, we assume (P) is true. 

Consider a complete realization p{t) in round t for which (i*,j*) is strongly i-influential (such a realization exists 
by our previous theorem) and construct the two complete realizations p and p' from {p{l), p(2), . . . , p{t~l), p{t)) 
which only differ in (i).Over all choices of counter-examples (bi,t, p(t),i* , j*), we choose the one which has 
the smallest influenced round t'. Now, we compare the payment that the mechanism has to make for this game 
instance at the end of t' rounds under the two different realizations p and p'. 

Let (f e {p,p'}. By the strong i-infiuence of {i*,j*,t), the agent i gets different allocations in round t' under 
the different realizations p and p' . This implies, 
T,j f^t3^t3iibt,b-t),p,t') ^ J2j l^tj^t3iibi,b-t), p' ,t'). 
Without loss of generality, 

j j 

(or agent i gets a higher slot under realization p than p'). 

By the non-degeneracy of A, there exists a finite interval of bids about bi such that for every bid x in this 
interval, 

Aj{{x,b.^),ip,t') ^ Aj{{b,,b^,),ip,t')yj (7) 

Suppose x' £ (0,6^) is another bid such that the same slot-agent-round set {i*,j*,t) is strongly i-influential 
with the same influenced round t' for the game {{x' , b-i), p). Then by the fairness of A, 

^MyAy ((x',&_0,/3,t') > ^Ai.,Ay ((x',&_0,/3',i') (8) 

3 3 

■^The idea for our proof is similar to that in the characterization of the single-slot case [3]; however, the details are non-trivially 
different. 
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From (O,©: and (O and using the fact that t' the smahest influenced round that is strongly z-influenced 
by the bit pi-j-(t) which is the only differing bit between p and p', we can see that Vx' € (0,6+) 

^p,jA,j{{x',b-i),p,t') > J2p,jA,,{{x',b^,),p',t') (9) 
j j 
and 3 a finite interval X around bid bi such that Vx G X , we have, 

^PijA^j{{x,b^i),p,t') > ^p^jAij{{x,b^^),p',t') (10) 

j 3 

From equations ^ and pH)) . and the fact that agent i's allocation is the same under both realizations p and 
p' until round t' (from smallest influenced round choice), we conclude that, 

Et=i lo' Ej ^^ijAij{{x, b^,),p', t)dx 
Additionally, we can assume that there are no clicks after round t' . As a result, we have Pi{bf ,b-i, p) ^ 
Pi{bf ,b^i, p'). However, the mechanism cannot distinguish between the two realizations p and p' as the only 
differing bit p^.j* (t) is unobserved. Hence, the mechanism fails to assign a unique payment to agent i. This is a 
consequence of our initial assumption ([TJ.Thus if A is not weakly separated the payments are not computable. 
This completes the proof. 



3.4 When CTR is Separable 

In the previous setting we assumed that some pre-estimate on the CTR matrix [pij] existed. In real world 
applications, however, it is very often the case that the slot-dependent probabilities are known while the agent 
dependent probabilities are unknown. To leverage this fact, we make a widely accepted assumption: we assume 
that the click probability due to the slot is independent of the click probability due to the agent. That is, we 
assume that pij = ai/3j, where ai is the click probability associated with agent i and Pj is the click probability 
associated with slot j. We also assume that the vector /3 = (/3i, /32, . . . , /3m) is common knowledge. In general 
Pi ^ P2 ■ ■ ■ > Pm- Here, any mechanism will use the explorative rounds to try to learn the values of ai as 
accurately as possible. 

Let Yij denote the number of times that agent i obtains the impression for slot j, and Xij denote the 
corresponding number of times she obtains a click. Then, we define a'^ = avgjl-^-^^} and p'^^ — a^Pj. 

In this section, we assume = or that a[ does not change with bid bi. We are justified in making this 
assumption since a[ is a good estimate of ai which is independent of which slot agent i obtains how many times. 
By changing her bid bi, agent i can only alter her allocations which should not predictably or significantly affect 

a' . It is trivial to see that = =4' ^ttt^- 

We model truthfulness based on the utility gained by each agent in expectation over this p[y That is, utility 
to an agent i is given by equation with p being replaced by p' . With the above setup, it can easily be 
seen that truthfulness mechanisms under this setting have the same characterization as the truthful mechanisms 
with a pre-estimate of CTR. 

Theorem 3.3 Let [A, P) be a mechanism for the stochastic multi-round auction setting where A is a non- 
degenerate, deterministic and fair allocation rule. Then, {A, P) is truthful in expectation over iff A is weakly 
pointwise monotone and weakly separated and the payment scheme is given by 

T m 

Pi{b,p) ^^^p[j{biA^j{b,p,t) - / Aij{x,b-^,p,t)dx} 
t=l j = l -^0 

Proof: 

This theorem can be proven using similar arguments as used in the proof of Theorem l3.2l with p being replaced 
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4 Experimental Analysis 



Since the single slot setting is a special case of the multi-slot setting, we obtain 51 (T^/'^) as a lower bound for 
the regret incurred by a truthful multi-slot sponsored search mechanism. 

We have characterized truthful MAB mechanisms in various settings in the previous section. However, we 
have not studied MAB mechanisms in multi-slot auctions for regret estimation in such mechanisms (except the 
0{T) worst case bound we showed for the unconstrained case in Section 3.1). In this section, we present a brief 
experimental study on the regret of an truthful MAB mechanism for multi-slot sponsored search auction under 
separable CTR case. 

For our study, we have picked a simple mechanism belonging to the separable CTR case. In the simulation, 
we displayed the agents in the available slots in a round robin fashion for the first rounds. Then, we used 
the observed information on the clicks to estimate the fiij values. The payments were computed as per Theorem 

(ESI). 

We performed simulations for various T values with k — 4 and m = 2. For a fixed T, we generated lOOT 
different instances, and estimated the average case as well as worst case regrets. In each instance, we generate 
CTRs and bids randomly. Figure [T] depicts Zn(worst case regret) and ^n(average case regret). It is observed that 
Zn(worst case regret) is closely approximated by ln{^T^^^) while /ri(average case regret) is closely approximated 
by /n(ir^/'^), clearly showing that the worst case regret is 0{T^^"^) and the average case regret is upper bounded 

by o(r2/3). 

5 Conclusion 

In this paper, we have provided characterizations for truthful multi-armed bandit mechanisms for various set- 
tings in the context of multi-slot pay-per-click auctions, thus generalizing the work of |21 H] in a non-trivial 
way. The first result we proved is a negative result which states that under the setting of unrestricted CTRs, 
any strategyproof allocation rule is necessarily strongly pointwise monotone. We also showed that every strate- 
gyproof mechanism in unrestricted CTR setting will have 0(r) regret. By weakening the notion of unrestricted 
CTRs, we were able to derive a larger class of strategyproof allocation rules. Our results are summarized in the 
Table m 

In the auctions that we have considered, the auctioneer cannot vary the number of slots he wishes to display. 
One possible extension of this work could be in this direction, that is, the auctioneer can dynamically decide 
the number of slots for advertisements. We assume that the bidders bid their maximum willingness to pay at 
the start of the first round and they would not change their bids till T rounds. Another possible extension 
would be to allow the agents to bid before every round. We are also exploring the cases where the bidders have 
budget constraints. 
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